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Abstract 



Given two genomes with duplicate genes, Zero Exemplar Distance is the problem of 
deciding whether the two genomes can be reduced to the same genome without duplicate genes 
by deleting all but one copy of each gene in each genome. Blin, Fertin, Sikora, and Vialette 
recently proved that Zero Exemplar Distance for monochromosomal genomes is NP-hard 
even if each gene appears at most two times in each genome, thereby settling an important open 
question on genome rearrangement in the exemplar model. In this paper, we give a very simple 
alternative proof of this result. We also study the problem Zero Exemplar Distance for 
multichromosomal genomes without gene order, and prove the analogous result that it is also 
NP-hard even if each gene appears at most two times in each genome. For the positive direction, 
we show that both variants of Zero Exemplar Distance admit polynomial-time algorithms if 
each gene appears exactly once in one genome and at least once in the other genome. In addition, 
we present a polynomial-time algorithm for the related problem Exemplar Longest Common 
Subsequence in the special case that each mandatory symbol appears exactly once in one 
input sequence and at least once in the other input sequence. This answers an open question of 
Bonizzoni et al. We also show that Zero Exemplar Distance for multichromosomal genomes 
without gene order is fixed-parameter tractable if the parameter is the maximum number of 
chromosomes in each genome. 

1 Introduction 

Given two genomes with duplicate genes, Genome Rearrangement with Gene Families [12] 
is the problem of deleting all but one copy of each gene in each genome, so as to minimize some 
rearrangement distance between the two reduced genomes. The minimum rearrangement distance 
thus attained is called the exemplar distance between the two genomes. For example, each of the 
following two monochromosomal genomes 
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has at most two copies of each gene, and each of the following two reduced genomes 
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has exactly one copy of each gene. Recall that in the study of genome rearrangement, a gene is 
usually represented by a signed integer: the absolute value of the integer (the unsigned integer) 
denotes the gene family to which the gene belongs; the sign of the integer denotes the orientation 
of the gene in its chromosome. Then a chromosome is a sequence of signed integers, and a genome 
is a collection of chromosomes. 

Genome Rearrangement with Gene Families is not a single problem but a whole class of 
related problems, because the choice of rearrangement distance is not unique. This choice becomes 
irrelevant, however, when we ask the fundamental question: Is the distance zero? In the example 
above, the two reduced genomes G\ and G' 2 are identical, thus the exemplar distance between the 
two original genomes G\ and G 2 is zero for any reasonable choice of rearrangement distance. 

In this paper, we study the most basic version of the problem Genome Rearrangement 
with Gene Families: Given two sequences of signed integers, Zero Exemplar Distance (for 
mono chromosomal genomes) is the problem of deciding whether the two sequences have a common 
subsequence including each unsigned integer exactly once in either positive or negative form. 

Due to its generic nature, the problem Zero Exemplar Distance has been extensively stud- 
ied by several groups of researchers [5j HI [2] focusing on different rearrangement distances, and, 
not surprisingly, has acquired several different names. Except for trivial distinctions, Zero Ex- 
emplar Distance is essentially the same problem as Zero Exemplar Conserved Interval 
Distance [5], Exemplar Longest Common Subsequence (deciding whether a feasible solution 
exists) [3], and Zero Exemplar Breakpoint Distance [2]. 

It is easy to check that if only one of the two genomes has duplicate genes, then Zero Exemplar 
Distance can be solved in linear time: we simply need to decide whether the genome without 
duplicates is a subsequence of the genome with duplicates. In sharp contrast, if both genomes 
contain duplicate genes, then even if each gene appears at most three times in each genome, 
the problem Zero Exemplar Distance is already NP-hard, as shown independently in three 
papers [SHU [2]. The quest for the exact boundary between polynomial solvability and NP-hardness 
led to the following open question first raised by Chen et al. in 2006: 

Question 1 (Chen, Fowler, Fu, and Zhu, 2006 |5j). Is the problem Zero Exemplar Distance for 
monochromosomal genomes still NP-hard if each gene appears at most two times in each genome? 

This question was finally settled in the affirmative by Blin et al. in 2009: 

Theorem 1 (Blin, Fertin, Sikora, and Vialette, 2009 [3]). Zero Exemplar Distance for monochro- 
mosomal genomes is NP-hard even if each gene appears at most two times in each genome. 

In Section [21 we give a very simple alternative proof of this theorem. 

Both the previous proof of Theorem Q] [3] and our alternative proof depend crucially on the 
order of the genes in the chromosomes. One may naturally wonder whether the complexity of Zero 
Exemplar Distance would change if gene order is not known. Note that genome rearrangement 
distances such as the syntenic distance [8] can be defined in the absence of gene order. 

Now model each chromosome as a set of unsigned integers instead of a sequence of signed 
integers. Then Zero Exemplar Distance for multichromosomal genomes without gene order 
is the following problem: Given two collections G\ and G 2 of subsets of the same ground set S 
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of unsigned integers, decide whether both G\ and G2 can be reduced, by deleting elements from 
subsets and deleting subsets from collections, to the same collection G' of subsets of S such that 
each unsigned integer in S is contained in exactly one subset in G' , i.e., G' is a partition of S. For 
example, 



In Section [3l we prove the following theorem analogous to Theorem [TJ 

Theorem 2. Zero Exemplar Distance for multichromosomal genomes without gene order is 
NP-hard even if each gene appears at most two times in each genome. 

As decision problems, both variants of Zero Exemplar Distance, for mono chromosomal 
genomes and for multichromosomal genomes without gene order, are in NP. Thus, following the 
NP-hardness results in Theorem [T] and Theorem [2J these two decision problems are both NP- 
complete. Moreover, the NP-hardness results in Theorem [1] and Theorem [2] imply that unless 
NP = P, the corresponding minimization problems of computing the exemplar distance between 
two genomes do not admit any approximation. We refer to [5j El [H [21 [Q for related results. 

The problem Zero Exemplar Distance for mono chromosomal genomes, as mentioned earlier, 
has been studied under several different names. Given two sequences A and B over an alphabet 
E = S1UE2, where Ei is a set of mandatory symbols and E2 is a set of optional symbols, Exemplar 
Longest Common Subsequence [I] is the problem of finding a longest common subsequence of 
A and B that contains all mandatory symbols in Ei. For example, if Ei = {1, 2, 3} and E2 = {4, 5}, 
then C = 124355 is an exemplar longest common subsequence of the two sequences A = 12423545 
and B = 1142443555. 

Due to the strict requirement on mandatory symbols, Exemplar Longest Common Sub- 
sequence does not always have a feasible solution. It is not difficult to see that simply deciding 
whether a feasible solution to Exemplar Longest Common Subsequence exists for two se- 
quences A and B is the same as the problem Zero Exemplar Distance for two monochromo- 
somal genomes A' and B' obtained from A and B by deleting all optional symbols. Recall that 
the problem Zero Exemplar Distance for monochromosomal genomes becomes trivial when 
only one of the two genomes has duplicate genes. For the equivalent problem of deciding whether 
a feasible solution to Exemplar Longest Common Subsequence exists, Bonizzoni et al. [I] 
showed another tractable special case: If each mandatory symbol appears a total of at most three 
times in A and B, then there is a polynomial-time algorithm, based on 2SAT, that decides whether 
A and B have a common subsequence containing all mandatory symbols. This algorithm does not 
solve the maximization problem, however, and the following question was left open: 

Question 2 (Bonizzoni et al. [4]). Is there a polynomial-time algorithm for Exemplar Longest 
Common Subsequence in the special case that each mandatory symbol appears a total of at most 
three times in the two input sequences? 

Without loss of generality, we assume that each input sequence contains each symbol in the 
alphabet at least once. If each mandatory symbol appears a total of at most three times in the 
two input sequences, then it must appear exactly once in one sequence, and at least once in the 
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other sequence, as in the example shown earlier. In Section HI we prove the following theorem that 
complements Theorem [T] and answers the open question of Bonizzoni et al. in the affirmative: 

Theorem 3. Zero Exemplar Distance for monochromosomal genomes admits a polynomial- 
time algorithm in the special case that each gene appears exactly once in one genome and at least 
once in the other genome. Exemplar Longest Common Subsequence admits a polynomial- 
time algorithm in the special case that each mandatory symbol appears exactly once in one input 
sequence and at least once in the other input sequence. 

Finally, in Section [5l we prove the following theorem that complements Theorem [2j 

Theorem 4. Zero Exemplar Distance for multichromosomal genomes without gene order ad- 
mits a polynomial-time algorithm in the special case that each gene appears exactly once in one 
genome and at least once in the other genome, and is fixed-parameter tractable if the parameter is 
the maximum number of chromosomes in each genome. 

2 Alternative Proof of Theorem [I] 

We prove that Zero Exemplar Distance for monochromosomal genomes is NP-hard by a reduc- 
tion from the well-known NP-complete problem 3SAT [9]. Let (V, E) be a 3SAT instance, where 
V = {v\, . . . , v n } is a set of n boolean variables, E = {ei, . . . , e m } is a conjunctive boolean formula 
of m clauses, and each clause in E is a disjunction of exactly three literals of the variables in V. 
We will construct two sequences (genomes) G\ and G% over 2n + 6m + 1 distinct unsigned integers 
(genes): 

• Two variable genes Xi,yt for each variable Uj, 1 < i < n; 

• Three clause genes atj, bj, Cj for each clause ey, 1 < j < m; 

• Three literal genes rj, Sj,tj for the three literals of each clause ej, 1 < j < m; 

• One separator gene z. 

In our construction, all genes appear in the positive orientation in the two genomes, so we will 
omit the signs in our description. The two genomes G\ and G2 are represented schematically as 
follows: 

Gi : (vi) . . . (v n ) z (ei) . . . (e m ) 
G 2 : (ui) • • • (v n ) z (ei) . . . (e m ) 

For each variable Vi, the variable gadget (vi) consists of one copy of X{ and two copies of yi in 
G\, two copies of Xi and one copy of yi in G2, and, for each literal of the variable in the clauses, 
one copy of the corresponding literal gene (r,-, Sj, or tj for some clause ej) in each genome. Let 
Pi,i, ■ ■ ■ ,Pi,k t be the literal genes for the positive literals of Vi, and let g^x, . . . , qi^ be the literal genes 
for the negative literals of V{. The genes Xi,yi,pi,i, . . . ,Pi,ki,Qi,i, ■ ■ ■ > Qi,k m the variable gadget (vi) 
are arranged in the following pattern in the two genomes: 

G\{vi) : yi p i; i . ..p i; ki Xi q if i . . . q i<h yi 
G 2 {vi) : p it i . . . pi, kl Xi yi Xi q itl . . . q ith 



4 



For each clause ej, the clause gadget (ej) consists of two copies of each clause gene a,j, bj, Cj and 
one copy of each literal gene rj , Sj , tj . These genes in (ej } are arranged in the following pattern in 
the two genomes: 

Gi {ej) : rj a,j bj <■, Sj a , bj <■, tj 
G 2 {ej) : ajVjbjajSjCjbjtjCj 

This completes the construction. It is easy to check that each gene appears at most two times 
in each genome, and that each genome includes exactly 3n + 12m + 1 genes including duplicates. 
We give an example: 

Example 1. For a 3SAT instance of A variables and 2 clauses e\ = {r\ = v\,s\ = -^v 2 ,t\ = ~^v 3 } 
and e 2 = {r 2 = _, wi,S2 = ^3)*2 = ^4}; the reduction constructs the following two genomes: 

d : ymxi^yi 2/2X2S12/2 y?,s2x 3 tiy 3 2/4*2^42/4 

z r\a\b\C\S\a\b\C\ti r 2 a 2 62 C2S2 02^2*2 

G 2 : r\x\y\x\r 2 x 2 y 2 X2Si s 2 x 3 y 3 x 3 t 1 t 2 X4_y A x 4 

z ainbiaisicibitici a2r 2 62a2S2C2&2*2C2 

The assignment v\ = true, ^2 = false, v 3 = false, v 4 = true satisfies the 3SAT instance and 
corresponds to the following common reduced genome: 

G' : nxiyi y 2 X2Si y 3 x 3 t x t 2 x A y A z a-yb\C\ r 2 a 2 s 2 ^2C2 

The reduction clearly runs in polynomial time. It remains to prove the following lemma: 

Lemma 1. The 3SAT instance (V, E) is satisfiable if and only if the two genomes G\ and G2 have 
a common subsequence G' including exactly one copy of each gene. 

We first prove the direct implication. Suppose that the 3SAT instance (V, E) is satisfiable. We 
will compose a common subsequence G' of the two genomes G\ and G2 from a common subsequence 
of each variable gadget (v j), the separator gene z in the middle, and a common subsequence of each 
clause gadget (ej). Consider a truth assignment that satisfies the 3SAT instance. For each variable 
Vi, take the subsequence pi t \ . . . xiyi if Vi is set to true, and take the subsequence yiXi g^i . . . qi^ 
if Vi is set to false. For each clause ej, at least one of its three literals is true; correspondingly, at 
least one of the three literal genes rj,Sj,tj has been taken from some variable gadget (vi). Now 
take a subsequence from the clause gadget (ej) following one of three cases: 

1. If rj has been taken, then take the subsequence ajbjSjCjtj. 

2. If Sj has been taken, then take either the subsequence rjbjdjCjtj or the subsequence rjajCjbjtj. 

3. If tj has been taken, then take the subsequence rjajSjbjCj. 

Here an underlined literal gene is omitted from the subsequence taken from the clause gadget (ej) 
if its other copy has already been taken from some variable gadget (vi). The common subsequence 
G' thus composed clearly includes exactly one copy of each gene. 

We next prove the reverse implication. Suppose that the two genomes Gi and G2 have a common 
subsequence G' including exactly one copy of each gene. We will find a satisfying assignment for 
the 3SAT instance (V, E) as follows. Due to the strategic location of the separator gene z in the 
two genomes, each literal gene must appear in the common subsequence either before z in both 
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genomes, in some variable gadget (uj), or after z in both genomes, in some clause gadget (e,-). The 
crucial property of the clause gadget (ej) is that it cannot have a common subsequence including 
exactly one copy of each clause gene aj , bj , Cj unless at least one of the three literal genes rj , Sj , tj 
is omitted. A literal gene omitted from the common subsequence of the clause gadget (ej) has to 
appear in the common subsequence of some variable gadget (vi), where the two variable genes Xi 
and %ji must appear in the order XiUi if the literal is positive and appear in the order j/jXj if the 
literal is negative. Now set each variable Vi to true if the two variable genes Xi and yi appear in 
the common subsequence G' in the order XiUi, and set it to false otherwise. Then each clause gets 
at least one true literal. This completes the proof of Theorem [TJ 

3 Proof of Theorem [2] 

We prove that Zero Exemplar Distance for multichromosomal genomes without gene order is 
NP-hard by a reduction again from 3SAT. Let (V, E) be a 3SAT instance, where V = {v\, . . . , v n } 
is a set of n boolean variables, E = {e\, . . . ,e m } is a conjunctive boolean formula of m clauses, 
and each clause in E is a disjunction of exactly three literals of the variables in V . Without loss 
of generality, assume that no clause in E contains two literals of the same variable in V. We will 
construct two genomes G\ and G2 over n + 9m distinct genes: 

• One variable gene Xi for each variable V{, 1 < % < n; 

• Six clause genes aj,bj,Cj,a'pb'pc!j for each clause ej, 1 < j < m; 

• Three literal genes rj,Sj,tj for the three literals of each clause ej, 1 < j < m. 

For each variable Vi, let Pi,i, ■ ■ ■ ,Pi,k % be the literal genes for the positive literals of Vi, and 
let g^i, . . . , qij. be the literal genes for the negative literals of Vi. G\ includes one subset and G2 
includes two subsets of genes including xf. 

G\(vi) : {p it i, . . . ,Pi,k v Xi,qi,i, ■ ■ ■ ,qi,u) 
G 2 (vi) : {Pi,i,--- ,Pi,k v Xi} {xi,qi : i,...,qi^} 

For each clause ej, G\ includes six subsets and G2 includes seven subsets of clause/literal genes: 

G 1 (e j ) : {aj,bj}{bj,Cj}{cj,aj} {a' j ,r j }{b' j ,Sj}{c' j ,tj} 

('■> (ej) ■ {aj , bj , Cj } {aj , a) , } {bj ,b'j,Sj} {<■, ,c'j,tj} {</' } {//,} |r' } 

This completes the construction. It is easy to check that each gene appears at most two times 
in each genome, G\ includes exactly n + 15m genes including duplicates, and G2 includes exactly 
2n + 18m genes including duplicates. We give an example: 

Example 2. For a 3SAT instance of 4 variables and 2 clauses e\ = {r\ = vi,s% = ^V2,t± = -^3} 
and e2 = {r2 = _, fi,S2 = "^3^2 = ^4}; the reduction constructs the following two genomes: 

Gi : {n,x 1 ,r 2 } {x 2 ,si} {s 2 ,x 3 ,ti} {^2,^4} 

{at, 61} {61, ci} {ci, ai} {a[, r{\ {b[, si} {c[, h} 

{a 2 ,b 2 } {b 2 , c 2 } {c 2 , a 2 } {a' 2 , r 2 } {b' 2 , s 2 } {c' 2 ,t 2 } 
G 2 : {n,xi} {xi,r 2 } {x 2 }{x 2 ,s 1 } {s 2 , x 3 } {x 3 , h} {t 2 ,X4,}{x4} 

{ai,6i,ci} {ai,a' 1 ,r 1 }{6 1 ,6 / 1 ,si}{ci,c' 1 ,ti} {ai} {b[} {c[} 

{a 2 ,b 2 ,c 2 } {a2,a' 2 ,r 2 }{b2,b' 2 ,S2}{c2,c' 2 ,t2} {a' 2 } {b' 2 } {c' 2 } 
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The assignment v\ = true, t> 2 = false, V3 = false, V4 = true satisfies the 3SAT instance and 
corresponds to the following common reduced genome: 

G' : {n,xi} {x 2l si} {x 3 ,h} {t 2 ,x A } 
{01} {6i, ci} K}{6UW} 
{c 2 }{a 2 ,6 2 } {a / 2 ,r 2 }{62,s 2 }{c / 2 } 

The reduction clearly runs in polynomial time. It remains to prove the following lemma: 

Lemma 2. The 3SAT instance (V, E) is satisfiable if and only if the two genomes G\ and G 2 have 
a common reduced genome G' including exactly one copy of each gene. 

We first prove the direct implication. Suppose that the 3SAT instance (V, E) is satisfiable. 
We will compose a common reduced genome G' of the two genomes G\ and G 2 as follows. Con- 
sider a truth assignment that satisfies the 3SAT instance. For each variable Vi, take the subset 
{pi ; l, . . . ,Pi,k v %i} if Vi is set to true, and take the subset {xi, q^i, . . . , qi^} if u» is set to false. For 
each clause ej, at least one of its three literals is true; correspondingly, at least one of the three 
literal genes Tj,Sj,tj has been taken from some variable gadget (uj). Now take some subsets of 
clause/literal genes following one of three cases: 

1. If rj has been taken, then take the subsets {%■}, {bj, Cj}, {a'j}, {b'j, Sj}, {c'j,tj}. 

2. If Sj has been taken, then take the subsets {bj}, {cj,aj}, {a'j, rj}, {b'j}, {c'j,tj}. 

3. If tj has been taken, then take the subsets {cj}, {aj, bj}, {a'-, rj}, {b'j, Sj}, {c'j}. 

Here an underlined literal gene is omitted from the subset taken from the clause gadget (ej) if its 
other copy has already been taken from some variable gadget (vi). The reduced genome G' thus 
composed clearly includes exactly one copy of each gene. 

We next prove the reverse implication. Suppose that the two genomes G\ and G 2 have a 
common reduced genome G' including exactly one copy of each gene. We will find a satisfying 
assignment for the 3SAT instance {V,E) as follows. The crucial property of the clause gadget (ej) 
is that it cannot have a common reduced genome including exactly one copy of each clause gene 
aj,bj,Cj,a'j,b'j,c'j unless at least one of the three literal genes rj,Sj,tj is omitted. A literal gene 
omitted from the clause gadget (ej) has to appear in a subset in G' that contains some variable 
gene Xj. By the construction of the variable gadgets, this subset contains, besides Xi, either literal 
genes for positive literals, or literal genes for negative literals. Now set each variable V{ to true if 
the subset in G' that contains Xi also contains at least one literal gene for a positive literal, and 
set it to false otherwise. Then each clause gets at least one true literal. This completes the proof 
of Theorem [2j 

4 Proof of Theorem [3] 

Let A and B be two sequences of lengths n and m, respectively, over an alphabet S = Si U S 2 , 
where Si is a set of mandatory symbols and S 2 is a set of optional symbols. In the special case 
that each mandatory symbol in Si appears exactly once in one sequence and at least once in 
the other sequence, we have the obvious but important property that any common subsequence 
of the two sequences can contain each mandatory symbol at most once. This property leads to a 
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very simple algorithm that decides whether a feasible solution to Exemplar Longest Common 
Subsequence exists in this special case: 

Algorithm 1. 

1. Obtain two sequences A' and B' from A and B by deleting all optional symbols in £2. 

2. Compute a longest common subsequence C* of A' and B'. 

3. If C* contains all mandatory symbols in Si, return yes. Otherwise, return no. 

The time complexity of Algorithm 1 is 0(nm) by using a standard dynamic programming 
algorithm for longest common subsequence [10] . The correctness of Algorithm 1 is justified by the 
following lemma: 

Lemma 3. A and B have a common subsequence containing all mandatory symbols in Si if and 
only if the longest common subsequence C* of A' and B' contains all mandatory symbols in Si. 

Proof. The reduction from A and B to A' and B' preserves the mandatory symbols. Thus A and B 
have a common subsequence containing all mandatory symbols in Si if and only if A' and B' have 
a common subsequence containing all mandatory symbols in Si. It remains to prove the equivalent 
claim that A' and B' have a common subsequence containing all mandatory symbols in Si if and 
only if C* contains all mandatory symbols in Si. 

The "if" direction of the claim is trivial because C* is a common subsequence of A' and B' . To 
prove the "only if" direction, recall that in any common subsequence of A' and B' , each mandatory 
symbol can appear at most once. Thus the length of any common subsequence of A' and B' is at 
most the size of Si. Moreover, if the length of some common subsequence of A' and B' is equal 
to the size of Si, then this common subsequence must contain all mandatory symbols in Si, and 
vice versa. Now suppose that A' and B' have a common subsequence C containing all mandatory 
symbols in Si. Then the length of C must be equal to the size of Si. Since the length of C* is at 
least the length of C, the length of C* must also be equal to the size of Si. Then C* must contain 
all mandatory symbols in Si too. This completes the proof. □ 

Since deciding whether a feasible solution to Exemplar Longest Common Subsequence 
exists for two sequences A and B is the same as the problem Zero Exemplar Distance for two 
mono chromosomal genomes A' and B' obtained from A and B by deleting all optional symbols, we 
also have an 0(nm) algorithm for Zero Exemplar Distance for monochromosomal genomes in 
the special case that each gene appears exactly once in one genome and at least once in the other 
genome. 

We next present an algorithm for the maximization problem Exemplar Longest Common 
Subsequence in the special case that each mandatory symbol appears exactly once in one input 
sequence and at least once in the other input sequence: 

Algorithm 2. 

1. Assign each mandatory symbol in Si a weight of w = min{n, m} + 1, and assign each optional 
symbol in S2 a weight of 1. Compute a common subsequence C* of A and B of the maximum 
total weight. 
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2. If C* contains all mandatory symbols in Si, return C*. Otherwise, report that no feasible 
solution exists. 

If A and B have no common subsequence containing all mandatory symbols in Ei, then clearly 
the maximum-weight common subsequence C* of A and B cannot contain all mandatory symbols 
in Ei, and hence the algorithm correctly reports that no feasible solution exists. Otherwise, the 
correctness of Algorithm 2 is justified by the following lemma: 

Lemma 4. If A and B have a common subsequence containing all mandatory symbols in Ei, then 
the maximum-weight common subsequence C* of A and B is a longest common subsequence of A 
and B that contains all mandatory symbols in Ei. 

Proof. Suppose that A and B have a common subsequence C containing all mandatory symbols 
in Ei. We first show that the maximum- weight common subsequence C* of A and B contains all 
mandatory symbols in Ei. Note that the number of optional symbols in C* is at most the length of 
C*, which is at most min{n, m}. Also recall that any common subsequence of A and B can contain 
each mandatory symbol at most once. If C* does not contain all mandatory symbols in Ei, then 
by our choice of w = min{ra, m} + 1, the total weight of C* would be at most 

(|Ei| — 1) • w + min{n, m} -1 < (|Ei| — 1) • w + w ■ 1 = |Ei| • w. 

On the other hand, since C contains all mandatory symbols in Ei, the weight of C is at least 
| Ei | -w. This contradicts the assumption that C* is a maximum- weight common subsequence of A 
and B. 

Now, since C* contains all mandatory symbols and can contain each mandatory symbol at most 
once, C* must contain each mandatory symbol exactly once. Then, to have the maximum total 
weight, C* must be a longest common subsequence of A and B that contains all mandatory symbols 
in Si. □ 

Again, the overall time complexity of Algorithm 2 is clearly 0(nm). This completes the proof 
of Theorem [3l 

5 Proof of Theorem |4] 

We present two algorithms for Zero Exemplar Distance for multichromosomal genomes without 
gene order. Let k\ and ft2, respectively, be the numbers of chromosomes in G± and Gi- Let 
Ai, . . . , A^ be the k\ chromosomes in G\. Let B\, . . . , B^ be the hi chromosomes in G2. Let 
k = max{A;i, £2}- Let n be the total number of genes in G% and G2, i.e., n = Y^i-Li \ + ]Cji=i l-^jl- 

We first present a polynomial-time algorithm for Zero Exemplar Distance for multichro- 
mosomal genomes without gene order in the special case that each gene appears exactly once in 
one genome and at least once in the other genome. Our algorithm is based on maximum-weight 
matching in bipartite graphs: 

Algorithm 3. 

1. Construct a complete bipartite graph G = (Vi U V2, Vi x V2) with vertices V\ = {A\, . . . , A^ } 
and V2 = {Bi, . . . , Bk 2 }. Associate with each edge between A{ € V\ and Bj G V2 a reduced 
chromosome Cij = Ai n Bj and a weight equal to its size. 
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2. Compute a maximum-weight matching M in the graph G. 

3. If the set of reduced chromosomes corresponding to the edges in M includes all the genes, 
return yes. Otherwise, return no. 

To see the correctness of Algorithm 3, note that each reduced chromosome of a common re- 
duced genome is a common subset of two distinct chromosomes, one from each input genome, and 
corresponds to an edge of a matching in the complete bipartite graph. In the special case that each 
gene appears exactly once in one genome and at least once in the other genome, no gene can appear 
more than once in the reduced chromosomes corresponding to the edges of a matching. Thus the 
maximum possible weight of a matching is equal to the number of distinct genes, and a common 
reduced genome that includes all the genes corresponds to a matching of the maximum weight. 

We now analyze the time complexity of Algorithm 3. Steps 1 and 3 can be easily implemented 
in 0(n 2 ) time. Step 2 can be implemented in 0(/c 3 ) time using a standard algorithm for weight 
bipartite matching; see e.g. |13j . Thus the overall time complexity is 0(n 2 + k 3 ). 

We next present a fixed-parameter tractable algorithm for this problem without any assumption 
on the distribution of duplicate genes. Refer to [7] for basic concepts in parameterized complexity 
theory. The parameter of our algorithm is k = max{fei, fo}: 

Algorithm 4. 

1. Add k — k\ empty chromosomes A^ l+ \, . . . , to G±, or add k — k<i empty chromosomes 
Bk 2 +i, ■ ■ ■ , -Bfc to G2, such that G± and G2 have the same number k of chromosomes. 

2. For each permutation n of (1, . . . , k), compute C w = U* =1 (Aj n B^u)). 

3. If for some permutation ir the set C n includes all the genes, return yes. Otherwise return no. 

To see the correctness of Algorithm 4, note again that each chromosome of a common reduced 
genome is a common subset of two distinct chromosomes, one from each input genome. All other 
chromosomes of the two input genomes that do not contribute to the common reduced genome are 
deleted. To handle the matching and the deletion of the chromosomes in a uniform way, we can 
think of each chromosome deleted from one genome as matched to a chromosome deleted from the 
other genome or to an empty chromosome. Thus by padding the two genomes to the same number 
of chromosomes, we only need to consider perfect matchings as permutations. The time complexity 
of Algorithm 4 is 0(kln 2 ), with 0(n 2 ) time for each of the k\ permutations. This completes the 
proof of Theorem 0J 

We remark that the problem Zero Exemplar Distance for multichromosomal genomes with- 
out gene order is unlikely to have a fixed-parameter tractable algorithm if the parameter is the 
maximum number of genes in any single chromosome. This is because 3SAT remains NP-hard even 
if for each variable there are at most five clauses that contain its literals [9j. As a result, the number 
of genes in each chromosome need not be more than some constant in our reduction from 3SAT. 
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